Skip to content

Conversation

@igor0
Copy link
Contributor

@igor0 igor0 commented Dec 15, 2025

An open-source library built on the Context Engine SDK that makes diverse sources searchable across agents and apps

Context Connectors enables users to:

  • Build indexes from Git repos (GitHub, GitLab, BitBucket), documentation websites, or local filesystem: index code, documentation, runbooks, schemas, configs, and more. Use DirectContext in the Context Engine SDK for custom sources.
  • Store indexes on a local filesystem for fast & simple access, or in S3 for persistent storage in production apps.
  • Search indexes via interactive agent, MCP for AI integrations, CLI for quick searches, or DirectContext in the Context Engine SDK for custom implementations.

igor0 added 30 commits December 15, 2025 06:19
Switch from openai() to openai.chat() to use the Chat Completions API
instead of the Responses API. The Responses API is stateful and generates
server-side IDs (fc_...) for function calls that are not persisted for
Zero Data Retention (ZDR) organizations, causing multi-step tool calls
to fail.

The Chat Completions API is stateless and works correctly with ZDR.
- OpenAI: gpt-5.2 → gpt-5-mini
- Anthropic: claude-sonnet-4-5 → claude-haiku-4-5
- Google: gemini-3-pro → gemini-3-flash-preview

Also adds Phase 10 test results documenting:
- ZDR compatibility fix (openai.chat vs openai)
- Model availability testing
- Multi-provider verification
- Add ./clients export path to package.json for programmatic API access
- Export createMCPServer, runMCPServer, MCPServerConfig from clients module
- Document Phase 11 programmatic API test results in test-results.md
Flip the default behavior: file tools (listFiles, readFile) are now
enabled by default. Use --search-only to disable them.

This is more intuitive - users get full functionality by default and
explicitly opt out when they only want the search tool.

- cmd-mcp: --search-only disables list_files/read_file tools
- cmd-agent: --search-only disables listFiles/readFile tools
- cmd-search: --search-only disables file access
…ansport

- Add mcp-http-server.ts with runMCPHttpServer() and createMCPHttpServer()
- Add mcp-serve CLI command with --port, --host, --cors, --base-path, --api-key options
- Support API key authentication via Authorization: Bearer header
- Support CORS for browser-based clients
- Update README with HTTP server documentation and examples
Updated tool descriptions for search, list_files, and read_file to be more
detailed and informative, adapting from Auggie CLI while keeping content
appropriate for context-connectors:

- Added multi-line descriptions with features and usage notes
- Included condensed regex syntax guide for searchPattern
- Clarified parameter semantics (1-based, inclusive, relative paths)
- Removed coding-specific language to support general use cases

Files modified:
- src/clients/mcp-server.ts
- src/clients/cli-agent.ts
…-specific store paths

- Rename -k, --key flag to -n, --name across all CLI commands
- Change default store location from CWD-relative .context-connectors to:
  - Linux: ~/.local/share/context-connectors (XDG Base Directory spec)
  - macOS: ~/Library/Application Support/context-connectors
  - Windows: %LOCALAPPDATA%\context-connectors
- Add CONTEXT_CONNECTORS_STORE_PATH environment variable override
- Priority order: --store-path CLI option > env var > platform default
- Update README.md with new flag, Data Storage section, and env var docs
Replace importFromFile with temp file pattern with direct import() call.
This is platform-neutral and avoids unnecessary filesystem operations.
Replace flat --source flag with subcommands for each source type:
- index filesystem (alias: fs)
- index github
- index gitlab
- index bitbucket
- index website

Each subcommand now shows only relevant options in --help.
Extracted shared store options into reusable helper functions.
- Make --name optional for mcp and mcp-serve commands (default: all indexes)
- Accept multiple index names with -n/--name <names...>
- Discover available indexes at startup from store
- Include available indexes in tool descriptions
- Add index_name parameter to all tools (search, list_files, read_file)
- Lazy initialization of SearchClient per index on first use
- Cache initialized clients for reuse
Show source type, identifier, and relative sync time for each index:

  NAME          SOURCE                     SYNCED
  augment-docs  github://augmentcode/docs  1d ago
  lm-plot       github://igor0/lm-plot     1d ago
- Change SourceMetadata to discriminated union with typed config per source
- Store original ref (branch/tag) separately from resolvedRef (SHA)
- Enable future re-indexing by preserving all source parameters
- Add getSourceIdentifier() and getResolvedRef() helper functions
- Maintain backward compatibility with legacy format indexes
- Update all sources, consumers, and tests
- sync <name> updates a single index using stored config
- sync --all updates all indexes
- Keeps 'index' command name (clearer than 'add')
- Add explicit path format examples with ✅/❌ to prevent /repo confusion
- Add example output for search results
- Use consistent 'repository root' terminology
- Add clearer parameter documentation with inline defaults
- Improve regex guidance with specific unsupported pattern examples
igor0 added 2 commits January 9, 2026 21:20
The crawler used url.href (includes querystring) for de-duplication
but url.pathname for storage paths. This caused URLs like /page?foo=1
and /page?bar=2 to both be crawled but stored to the same path,
creating duplicates or overwrites.

Now we use pathname consistently for both de-duplication and storage,
ensuring each page is only crawled once regardless of query params.
The crawler uses url.href for de-duplication (different query params =
different content, like browser caching). Now the storage path also
includes a sanitized query string, so each unique URL gets its own file.

Examples:
- /articles?page=1 -> articles_page=1.md
- /articles?page=2 -> articles_page=2.md
- /search?q=cats   -> search_q=cats.md

This preserves paginated content and search results that vary by query.
@igor0
Copy link
Contributor Author

igor0 commented Jan 9, 2026

augment review

Copy link

@augment-app-staging augment-app-staging bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 5 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

}

// Check if request is for MCP endpoint
if (!url.pathname.startsWith(basePath)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using startsWith(basePath) can also match paths like /mcp2 when basePath is /mcp, potentially exposing the handler on unintended routes. Consider enforcing an exact match or a basePath/ boundary.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

}
} catch (error) {
if (!res.headersSent) {
res.writeHead(500, { "Content-Type": "application/json" });

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errors from parseBody (invalid JSON / body too large) currently fall into this catch and return a 500 JSON-RPC error. That makes client-side mistakes look like server failures and can cause noisy retries; consider mapping these to 4xx responses.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

const dir = options.saveContent;
await fs.mkdir(dir, { recursive: true });
for (const file of files) {
const filePath = path.join(dir, file.path);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When --save-content is used, path.join(dir, file.path) will honor .. segments or absolute paths inside file.path, which could write outside the target directory if the crawled site emits unexpected URLs. Consider normalizing/sanitizing file.path before using it as a filesystem path.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

indexKey = keyPath;
} else {
const prefix = keyPath.slice(0, lastSlash + 1);
indexKey = keyPath.slice(lastSlash + 1);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the S3 URL ends with a trailing /, indexKey becomes an empty string here (keyPath.slice(lastSlash + 1)), which likely yields an invalid index name / object key. Consider validating keyPath doesn’t end with / (or trimming) before computing indexKey.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

// Import previous context if available (for client-side deduplication),
// otherwise create a new one
const context = previousState
? await DirectContext.import(previousState.contextState, {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full reindex imports previousState.contextState into the context for deduplication, but fetchAll() may omit files that were deleted since the last run. If addToIndex is additive, those deleted files could persist in the index; consider whether full reindex should start from an empty context or explicitly remove missing paths.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

igor0 added 5 commits January 9, 2026 21:45
Using startsWith(basePath) could match unintended routes like /mcp2
when basePath is /mcp. Now we check for exact match OR subpath with
a / boundary, preventing the handler from being exposed on unintended
routes.
Errors from parseBody (invalid JSON, body too large) were falling
through to the generic catch block and returning 500 Internal Server
Error. Now we catch these explicitly and return appropriate status:
- Invalid JSON: 400 Bad Request with JSON-RPC error -32700
- Body too large: 413 Payload Too Large with JSON-RPC error -32600

This prevents client-side mistakes from looking like server failures
and avoids noisy retries.
When using --save-content, file.path from crawled URLs could contain
.. segments or be absolute paths, potentially writing outside the
target directory. Now we:
1. Normalize the path to resolve .. segments
2. Strip any leading .. sequences
3. Verify the resolved path stays within the target directory
4. Skip files with unsafe paths (with a warning)
If the S3 URL ends with a trailing slash, indexKey would become an
empty string, yielding an invalid index name. Now we:
1. Trim trailing slashes from the key path
2. Validate that slashIdx is not -1 (handle s3://bucket without path)
3. Throw a clear error if the index key is missing
Instead of string matching on error messages to determine status codes,
use a custom HttpError class with a statusCode property. This is more
robust and idiomatic TypeScript.
@igor0
Copy link
Contributor Author

igor0 commented Jan 9, 2026

augment review

@igor0 igor0 requested a review from richhankins January 9, 2026 23:51
Copy link

@augment-app-staging augment-app-staging bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 5 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

for (let i = 0; i < filteredEntries.length; i++) {
const entry = filteredEntries[i];
const entrySize = entry.path.length + entry.type.length + 5; // " [type]\n"
if (estimatedSize + entrySize > maxOutputLength) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If maxOutputLength is smaller than the first entry’s estimated size, this returns an empty array, and formatListOutput will report “No files found” even though entries exist (they were just truncated). Callers also can’t tell truncation happened since truncated/omittedCount aren’t returned.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's OK

// Build output with gaps shown as "..."
const sortedLines = Array.from(lineNumbers).sort((a, b) => a - b);
const outputLines: string[] = [];
let lastLine = -2; // -2 so first line doesn't trigger gap

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With lastLine = -2, the lineIdx > lastLine + 1 check is true for the first emitted line (e.g., 0 > -1), so search output will always start with ... despite the comment. This can imply leading content was omitted even when the first match/context line is at the top of the range.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed


// Add files to index
if (files.length > 0) {
await this.addToIndex(context, files);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When fullIndex reuses previousState for dedup, it only calls addToIndex(files) and never removes documents for files that no longer exist in the source, so deleted files may remain searchable while filesRemoved is always reported as 0. This seems especially likely on full-refresh paths like force-push handling.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed... this was an over-optimization. when we are doing a full reindex (e.g., due to force push), we shouldn't try to reuse the existing index

}

private getStateKey(key: string): string {
return `${this.prefix}${key}/${STATE_FILENAME}`;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3 object keys are built via string concatenation (${this.prefix}${key}/…); if prefix is configured without a trailing / you’ll generate unexpected keys, and if key contains / it can break list()/delete() behavior with Delimiter: '/'. That can make some indexes undiscoverable or hard to delete depending on naming.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed. prefix is now normalized to always end with /

const indexes: IndexInfo[] = [];
const validIndexNames: string[] = [];
for (const name of indexNames) {
const state = await store.loadSearch(name);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says this loop “filter[s] out any that fail to load”, but any exception thrown by store.loadSearch(name) will abort initialization rather than skipping that index. A single corrupted/partially-written state could take down the whole multi-index runner.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added try-catch to skip any indexes that caused a crash

igor0 added 4 commits January 10, 2026 00:04
…ine 1

With lastLine = -2, the check 'lineIdx > lastLine + 1' was true for the
first emitted line at index 0 (0 > -1), so search output always started
with '...' implying content was omitted even when the first match/context
was at the top of the file. Change to -1 so only lines after index 0
trigger the gap marker.
When fetchChanges() returns null (e.g., force push), the previous code
reused previousState for client-side deduplication. This caused deleted
files to remain searchable since only addToIndex() was called, never
removeFromIndex().

Fix: Always create a fresh context for full re-index. This ensures the
index accurately reflects the current source state. Removed the unused
previousState parameter from fullIndex() since it's now always null.
1. Normalize prefix to always end with '/' (unless empty) to prevent
   malformed keys like 'context-connectorsmykey/state.json'

2. Use sanitizeKey() to sanitize index keys, replacing '/' with '_'
   to prevent issues with S3 Delimiter '/' in list() that would make
   nested keys undiscoverable

3. Add guard in delete() to reject empty/unsafe keys that sanitize
   to empty string (consistent with FilesystemStore)
Wrap store.loadSearch() in try-catch so that corrupted or partially-written
state files don't abort initialization of the entire multi-index runner.
This matches the comment's stated intent of 'filtering out any that fail
to load'.
@igor0
Copy link
Contributor Author

igor0 commented Jan 10, 2026

augment review

Copy link

@augment-app-staging augment-app-staging bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 3 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

estimatedSize += entrySize;
}

// Add truncation info to last entry if needed (for display purposes)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

listFiles() may truncate entries based on maxOutputLength, but it returns only FileInfo[] so callers can’t tell truncation occurred (and the comment implies comparing to a “reported total count”, which isn’t returned). Consider returning metadata (e.g., truncated/omittedCount/total) so consumers don’t assume the listing is complete.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

}

async delete(key: string): Promise<void> {
// Guard against empty/unsafe keys

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete() rejects keys that sanitize to an empty string, but save()/loadState()/loadSearch() will still accept such keys and write objects that can’t be deleted via this API. Consider validating sanitizeKey(key) consistently across all operations to avoid creating undeletable state/search objects.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed - added checks to the other functions

next: NextFunction
) {
try {
const signature = req.headers["x-hub-signature-256"] as string;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

req.headers[...] can be string | string[] in Express/Node; casting to string here can yield an array at runtime and break signature/event validation. Consider normalizing/rejecting multi-valued headers explicitly (similar to the auth middleware in mcp-http-server.ts).

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

igor0 added 3 commits January 10, 2026 05:43
Changed listFiles() return type from Promise<FileInfo[]> to Promise<ListFilesResult>
so callers can detect when output was truncated. The ListFilesResult includes:
- entries: FileInfo[]
- truncated?: boolean
- omittedCount?: number

Updated formatListOutput() to accept ListFilesResult and display a truncation
notice when entries were omitted due to maxOutputLength.
Added validateAndSanitizeKey() helper that throws on empty sanitized keys.
Used by getStateKey() and getSearchKey() which are called by all operations
(save, loadState, loadSearch, delete). This prevents creating objects that
cannot be deleted via this API.
Express/Node headers can be string | string[]. Previously the code cast them
directly to string which could break signature/event validation at runtime.
Now explicitly check for and reject array-valued headers, consistent with
the auth middleware in mcp-http-server.ts.
@igor0
Copy link
Contributor Author

igor0 commented Jan 10, 2026

augment review

Copy link

@augment-app-staging augment-app-staging bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 4 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

const { data } = await octokit.repos.getCommit({
owner: this.owner,
repo: this.repo,
ref: this.ref,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolveRefToSha() calls repos.getCommit({ ref }), but ref can be the default "HEAD" (see constructor/CLI/docs) and GitHub’s commits API expects a branch/tag/SHA—"HEAD" is likely to 404 and make default indexing fail. Consider mapping "HEAD" to the repo’s default-branch head before calling getCommit.

Other Locations
  • context-connectors/src/sources/github.ts:111
  • context-connectors/src/bin/cmd-index.ts:67
  • context-connectors/README.md:98

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested it and there is no problem: "my tests show that in practice GitHub's API does accept "HEAD" and resolves it correctly. This is likely because the Octokit library or GitHub's API internally resolves "HEAD" to the default branch."

const { data } = await octokit.repos.getContent({
owner: this.owner,
repo: this.repo,
path: directory,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When listing the repo root, this passes path: "" into repos.getContent(), which can become /contents/ and may 404 (the root contents endpoint is typically /contents without a {path}). This could make listFiles() fail for the root directory.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Contributor Author

@igor0 igor0 Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an issue:

Findings:
• The review comment is incorrect. The GitHub API and Octokit library handle path: "" correctly for listing root directory contents.
• The Octokit library properly constructs the URL, avoiding the trailing slash issue the reviewer was concerned about.
• The existing code works as intended.

* const store: IndexStore = new FilesystemStore();
*
* // Indexer uses full interface
* await store.save("my-project", indexState);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs example calls store.save("my-project", indexState), but IndexStore.save now requires (key, fullState, searchState); similarly, the FilesystemStore doc block still references store.load()/store.exists() APIs that no longer exist. This mismatch will likely confuse store implementers/users.

Other Locations
  • context-connectors/src/stores/filesystem.ts:24

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the docs

}

const {
directory = "",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

directory is passed through without normalization, so inputs like "/src/" or "./src" can produce double slashes or leading slashes and may break some Source.listFiles implementations. It might be worth normalizing directory similarly to readFile’s path handling.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed - added normalization

igor0 added 2 commits January 10, 2026 18:10
Update code examples in types.ts and filesystem.ts to match the current API:
- save() now requires (key, fullState, searchState) not (key, indexState)
- load() no longer exists; use loadState() and loadSearch() instead
- Added list() to the example for completeness
Add normalizePath() utility that removes leading './' and '/', trailing '/',
and collapses multiple slashes. This prevents issues with Source.listFiles()
implementations that may not handle paths like '/src/', './src', or 'src//lib'
correctly.
@igor0
Copy link
Contributor Author

igor0 commented Jan 10, 2026

augment review

Copy link

@augment-app-staging augment-app-staging bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 3 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

* Resolve ref (branch/tag/HEAD) to commit SHA
*/
private async resolveRefToSha(): Promise<string> {
if (this.resolvedRef) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolveRefToSha() caches resolvedRef, which means a GitHubSource created with ref: "HEAD" (or a branch/tag) will keep using the first resolved SHA across later index() calls and may never observe new commits. Consider only caching when ref is already a commit SHA (or otherwise re-resolving per run).

Other Locations
  • context-connectors/src/sources/gitlab.ts:113
  • context-connectors/src/sources/bitbucket.ts:84

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is OK... the alternative would be the SHA changes across calls, which would confuse some callers. documented

} else if (typeof req.body === "string") {
body = req.body;
} else {
body = JSON.stringify(req.body);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If req.body is an object (e.g. after express.json()), JSON.stringify(req.body) generally won’t match GitHub’s original raw payload bytes, so verifyWebhookSignature will consistently fail even with the correct secret. It may be safer to reject non-raw bodies and require express.raw() (or capture the raw body via the JSON middleware verify option) for signature validation.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

reject(new HttpError(`Request body too large (max ${MAX_BODY_SIZE} bytes)`, 413));
return;
}
body += chunk;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

body += chunk implicitly decodes each Buffer chunk to UTF‑8 separately, which can corrupt multibyte characters split across chunks and lead to intermittent JSON parse failures. Collecting Buffers and decoding once (or calling req.setEncoding("utf8") and treating chunks as strings) avoids this.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

igor0 added 3 commits January 10, 2026 20:56
Collect Buffer chunks and decode once at the end instead of using
'body += chunk' which implicitly decodes each chunk separately.
This prevents corruption of multibyte UTF-8 characters that may be
split across chunk boundaries.
The resolveRefToSha() method caches the resolved SHA for the lifetime of
the instance to ensure consistency across multiple operations. Added
documentation explaining this behavior and recommending users create a
new source instance to pick up new commits.
JSON.stringify(req.body) won't match GitHub's original payload bytes
(due to key ordering, whitespace, unicode escaping differences), so
signature verification would consistently fail. Now requires raw body
(Buffer from express.raw() or string) and returns 400 error with a
helpful message if an object body is detected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants